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Abstract 

We propose a novel method for constructing wavelet transforms of 
functions denned on the vertices of an arbitrary finite weighted graph. 
Our approach is based on defining scaling using the the graph analogue 
of the Fourier domain, namely the spectral decomposition of the discrete 
graph Laplacian C. Given a wavelet generating kernel g and a scale pa- 
rameter t, we define the scaled wavelet operator T* = g(tC). The spectral 
graph wavelets are then formed by localizing this operator by applying 
it to an indicator function. Subject to an admissibility condition on g, 
this procedure defines an invertible transform. We explore the localiza- 
tion properties of the wavelets in the limit of fine scales. Additionally, we 
present a fast Chebyshev polynomial approximation algorithm for com- 
puting the transform that avoids the need for diagonalizing C. We high- 
light potential applications of the transform through examples of wavelets 
on graphs corresponding to a variety of different problem domains. 

1 Introduction 

Many interesting scientific problems involve analyzing and manipulating struc- 
tured data. Such data often consist of sampled real-valued functions defined 
on domain sets themselves having some structure. The simplest such examples 
can be described by scalar functions on regular Euclidean spaces, such as time 
series data, images or videos. However, many interesting applications involve 
data defined on more topologically complicated domains. Examples include 
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data defined on network-like structures, data defined on manifolds or irregu- 
larly shaped domains, and data consisting of "point clouds" , such as collections 
of feature vectors with associated labels. As many traditional methods for signal 
processing are designed for data defined on regular Euclidean spaces, the devel- 
opment of methods that are able to accommodate complicated data domains is 
an important problem. 

Many signal processing techniques are based on transform methods, where 
the input data is represented in a new basis before analysis or processing. One 
of the most successful types of transforms in use is wavelet analysis. Wavelets 
have proved over the past 25 years to be an exceptionally useful tool for signal 
processing. Much of the power of wavelet methods comes from their ability to 
simultaneously localize signal content in both space and frequency. For signals 
whose primary information content lies in localized singularities, such as step 
discontinuities in time series signals or edges in images, wavelets can provide a 
much more compact representation than either the original domain or a trans- 
form with global basis elements such as the Fourier transform. An enormous 
body of literature exists for describing and exploiting this wavelet sparsity. We 
include a few representative references for applications to signal compression 
[1, 2, 3, 4, 5], denoising [6, 7, 8, 9, 10], and inverse problems including deconvo- 
lution [11, 12, 13, 14, 15]. As the individual waveforms comprising the wavelet 
transform are self similar, wavelets are also useful for constructing scale invariant 
descriptions of signals. This property can be exploited for pattern recognition 
problems where the signals to be recognized or classified may occur at different 
levels of zoom [16]. In a similar vein, wavelets can be used to characterize fractal 
self-similar processes [17]. 

The demonstrated effectiveness of wavelet transforms for signal processing 
problems on regular domains motivates the study of extensions to irregular, non- 
euclidean spaces. In this paper, we describe a flexible construction for defining 
wavelet transforms for data defined on the vertices of a weighted graph. Our 
approach uses only the connectivity information encoded in the edge weights, 
and does not rely on any other attributes of the vertices (such as their positions 
as embedded in 3d space, for example). As such, the transform can be defined 
and calculated for any domain where the underlying relations between data 
locations can be represented by a weighted graph. This is important as weighted 
graphs provide an extremely flexible model for approximating the data domains 
of a large class of problems. 

Some data sets can naturally be modeled as scalar functions defined on 
the vertices of graphs. For example, computer networks, transportation (road, 
rail, airplane) networks or social networks can all be described by weighted 
graphs, with the vertices corresponding to individual computers, cities or people 
respectively. The graph wavelet transform could be useful for analyzing data 
defined on these vertices, where the data is expected to be influenced by the 
underlying topology of the network. As a mock example problem, consider 
rates of infection of a particular disease among different population centers. As 
the disease may be expected to spread by people traveling between different 
areas, the graph wavelet transform based on a weighted graph representing the 
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transportation network may be helpful for this type of data analysis. 

Weighted graphs also provide a flexible generalization of regular grid do- 
mains. By identifying the grid points with vertices and connecting adjacent 
grid points with edges with weights inversely proportional to the square of the 
distance between neighbors, a regular lattice can be represented with weighted 
graph. A general weighted graph, however, has no restriction on the regularity 
of vertices. For example points on the original lattice may be removed, yielding 
a "damaged grid", or placed at arbitrary locations corresponding to irregular 
sampling. In both of these cases, a weighted graph can still be constructed that 
represents the local connectivity of the underlying data points. Wavelet trans- 
forms that rely upon regular spaced samples will fail in these cases, however 
transforms based on weighted graphs may still be defined. 

Similarly, weighted graphs can be inferred from mesh descriptions for geo- 
metrical domains. An enormous literature exists on techniques for generating 
and manipulating meshes; such structures are widely used in applications for 
computer graphics and numerical solution of partial differential equations. The 
transform methods we will describe thus allow the definition of a wavelet trans- 
form for data defined on any geometrical shape that can be described by meshes. 

Weighted graphs can also be used to describe the similarity relationships 
between "point clouds" of vectors. Many approaches for machine learning or 
pattern recognition problems involve associating each data instance with a col- 
lection of feature vectors that hopefully encapsulate sufficient information about 
the data point to solve the problem at hand. For example, for machine vi- 
sion problems dealing with object recognition, a common preprocessing step 
involves extracting keypoints and calculating the Scale Invariant Feature Trans- 
form (SIFT) features [18]. In many automated systems for classifying or retriev- 
ing text, word frequencies counts are used as feature vectors for each document 
[19]. After such feature extraction, each data point may be associated to a fea- 
ture vector v m G 1^, where N may be very large depending on the application. 
For many problems, the local distance relationships between data points are 
crucial for successful learning or classification. These relationships can be en- 
coded in a weighted graph by considering the data points as vertices and setting 
the edge weights equal to a distance metric A mjn = d(v m , v n ) for some function 
d : R N x R N — > R. The spectral graph wavelets applied to such graphs derived 
from point clouds could find a number of uses, including regression problems 
involving learning or regularizing a scalar function defined on the data points. 

Classical wavelets are constructed by translating and scaling a single "mother" 
wavelet. The transform coefficients are then given by the inner products of the 
input function with these translated and scaled waveforms. Directly extending 
this construction to arbitrary weighted graphs is problematic, as it is unclear 
how to define scaling and translation on an irregular graph. We approach this 
problem by working in the spectral graph domain, i.e. the space of eigenfunc- 
tions of the graph Laplacian C. This tool from spectral graph theory [20], 
provides an analogue of the Fourier transform for functions on weighted graphs. 
In our construction, the wavelet operator at unit scale is given as an operator 
valued function T g = g(C) for a generating kernel g. Scaling is then defined 
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in the spectral domain, i.e. the operator T l g at scale t is given by g(tC). Ap- 
plying this operator to an input signal / gives the wavelet coefficients of / at 
scale t. These coefficients are equivalent to inner products of the signal / with 
the individual graph wavelets. These wavelets can be calculated by applying 
this operator to a delta impulse at a single vertex, i.e. ipt,m = Tg5 m . We show 
that this construction is analogous to the 1-d wavelet transform for a symmetric 
wavelet, where the transform is viewed as a Fourier multiplier operator at each 
wavelet scale. 

In this paper we introduce this spectral graph wavelet transform and study 
several of its properties. We show that in the fine scale limit, for sufficiently 
regular g, the wavelets exhibit good localization properties. With continuously 
defined spatial scales, the transform is analogous to the continuous wavelet 
transform, and we show that it is formally invertible subject to an admissibility 
condition on the kernel g. Sampling the spatial scales at a finite number of 
values yields a redundant, invertible transform with overcompleteness equal to 
the number of spatial scales chosen. We show that in this case the transform 
defines a frame, and give a condition for computing the frame bounds depending 
on the selection of spatial scales. 

While we define our transform in the spectral graph domain, directly com- 
puting it via fully diagonalizing the Laplacian operator is infeasible for prob- 
lems with size exceeding a few thousand vertices. We introduce a method for 
approximately computing the forward transform through operations performed 
directly in the vertex domain that avoids the need to diagonalize the Laplacian. 
By approximating the kernel g with a low dimensional Chebyshev polynomial, 
we may compute an approximate forward transform in a manner which ac- 
cesses the Laplacian only through matrix-vector multiplication. This approach 
is computationally efficient if the Laplacian is sparse, as is the case for many 
practically relevant graphs. 

We show that computation of the pseudoinverse of the overcomplete spectral 
graph wavelet transform is compatible with the Chebyshev polynomial approxi- 
mation scheme. Specifically, the pseudoinverse may be calculated by an iterative 
conjugate gradient method that requires only application of the forward trans- 
form and its adjoint, both of which may be computed using the Chebyshev 
polynomial approximation methods. 

Our paper is structured as follows. Related work is discussed in Section 
1.1. We review the classical wavelet transform in Section 2, and highlight the 
interpretation of the wavelet acting as a Fourier multiplier operator. We then set 
our notation for weighted graphs and introduce spectral graph theory in Section 
4. The spectral graph wavelet transform is defined in Section 4. In Section 5 we 
discuss and prove some properties of the transform. Section 6 is dedicated to the 
polynomial approximation and fast computation of the transform. The inverse 
transform is discussed in section 7. Finally, several examples of the transform 
on domains relevant for different problems are shown in Section 8. 
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1.1 Related Work 



Since the original introduction of wavelet theory for square integrable func- 
tions defined on the real line, numerous authors have introduced extensions 
and related transforms for signals on the plane and higher dimensional spaces. 
By taking separable products of one dimensional wavelets, one can construct 
orthogonal families of wavelets in any dimension [21]. However, this yields 
wavelets with often undesirable bias for coordinate axis directions. A large fam- 
ily of alternative multiscale transforms has been developed and used extensively 
for image processing, including Laplacian pyramids [22], steerable wavelets [23], 
complex dual-tree wavelets [24] , curvelets [25] , and bandlets [26] . Wavelet trans- 
forms have also been defined for certain non-Euclidean manifolds, most notably 
the sphere [27, 28] and other conic sections [29]. 

Previous authors have explored wavelet transforms on graphs, albeit via 
different approaches to those employed in this paper. Crovella and Kolaczyk 
[30] defined wavelets on unweighted graphs for analyzing computer network 
traffic. Their construction was based on the n-hop distance, such that the value 
of a wavelet centered at a vertex n on vertex m depended only on the shortest- 
path distance between m and n. The wavelet values were such that the sum 
over each n-hop annulus equaled the integral over an interval of a given zero- 
mean function, thus ensuring that the graph wavelets had zero mean. Their 
results differ from ours in that their construction made no use of graph weights 
and no study of the invertibility or frame properties of the transform was done. 
Smalter et. al. [31] used the graph wavelets of Crovella and Kolaczyk as part of a 
larger method for measuring structural differences between graphs representing 
chemical structures, for machine learning of chemical activities for virtual drug 
screening. 

Maggioni and Coiffmann [32] introduced "diffusion wavelets" , a general the- 
ory for wavelet decompositions based on compressed representations of powers 
of a diffusion operator. The diffusion wavelets were described with a frame- 
work that can apply on smooth manifolds as well as graphs. Their construction 
interacts with the underlying graph or manifold space through repeated ap- 
plications of a diffusion operator T, analogously to how our construction is 
parametrized by the choice of the graph Laplacian C. The largest difference 
between their work and ours is that the diffusion wavelets are designed to be or- 
thonormal. This is achieved by running a localized orthogonalization procedure 
after applying dyadic powers of T at each scale to yield nested approximation 
spaces, wavelets are then produced by locally orthogonalizing vectors spanning 
the difference of these approximation spaces. While an orthogonal transform 
is desirable for many applications, notably operator and signal compression, 
the use of the orthogonalization procedure complicates the construction of the 
transform, and somewhat obscures the relation between the diffusion operator 
T and the resulting wavelets. In contrast our approach is conceptually simpler, 
gives a highly redundant transform, and affords finer control over the selection 
of wavelet scales. 

Geller and Mayeli [33] studied a construction for wavelets on compact differ- 
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entiable manifolds that is formally similar to our approach on weighted graphs. 
In particular, they define scaling using a pseudodifferential operator tLe~ tL , 
where L is the manifold Laplace-Beltrami operator and t is a scale parameter, 
and obtain wavelets by applying this to a delta impulse. They also study the lo- 
calization of the resulting wavelets, however the methods and theoretical results 
in their paper are different as they are in the setting of smooth manifolds. 



2 Classical Wavelet Transform 

We first give an overview of the classical Continuous Wavelet Transform (CWT) 
for L 2 (M), the set of square integrable real valued functions. We will describe 
the forward transform and its formal inverse, and then show how scaling may 
be expressed in the Fourier domain. These expressions will provide an analogue 
that we will later use to define the Spectral Graph Wavelet Transform. 

In general, the CWT will be generated by the choice of a single "mother" 
wavelet ip. Wavelets at different locations and spatial scales are formed by 
translating and scaling the mother wavelet. We write this by 

f/, 3>a (x) = -v> (—) (1) 



s \ s 

This scaling convention preserves the L 1 norm of the wavelets. Other scaling 
conventions are common, especially those preserving the L 2 norm, however in 
our case the L 1 convention will be more convenient. We restrict ourselves to 
positive scales s > 0. 

For a given signal /, the wavelet coefficient at scale s and location a is given 
by the inner product of / with the wavelet ip s ,a, i-e. 



W f (s,a) = J°° V(^) f(x)dx 



(2) 



The CWT may be inverted provided that the wavelet ip satisfies the admis- 
sibility condition 

fe^ = ^<oo (3) 

CO 

This condition implies, for continuously differentiable ip, that ^(0) = J ip(x)dx = 
0, so ip must be zero mean. 

Inversion of the CWT is given by the following relation [34] 

f(*) = 7 T / W f (s,a)^ a (x) (4) 

O^, Jo J-oo s 

This method of constructing the wavelet transform proceeds by producing 
the wavelets directly in the signal domain, through scaling and translation. 
However, applying this construction directly to graphs is problematic. For a 
given function ip(x) defined on the vertices of a weighted graph, it is not obvious 
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how to define ip(sx), as if x is a vertex of the graph there is no interpretation of 
sx for a real scalar s. Our approach to this obstacle is to appeal to the Fourier 
domain. We will first show that for the classical wavelet transform, scaling can 
be defined in the Fourier domain. The resulting expression will give us a basis 
to define an analogous transform on graphs. 

For the moment, we consider the case where the scale parameter is discretized 
while the translation parameter is left continuous. While this type of transform 
is not widely used, it will provide us with the closest analogy to the spectral 
graph wavelet transform. For a fixed scale s, the wavelet transform may be 
interpreted as an operator taking the function / and returning the function 
T s f(a) = Wf(s,a). In other words, we consider the translation parameter as 
the independent variable of the function returned by the operator T s . Setting 

Mx) = V , (5) 

we see that this operator is given by convolution, i.e. 

(T s f)(a) = V (^) f(x)dx (6) 

/CO 
4) s (a - x)f(x)dx 
-co 

= C0 s */)(a) 

Taking the Fourier transform and applying the convolution theorem yields 

fVM = (7) 

Using the scaling properties of the Fourier transform and the definition (5) gives 

4m = ^*m (8) 

Combining these and inverting the transform we may write 

(T s f)(x) = ^f e^'M/Md" (9) 

In the above expression, the scaling s appears only in the argument of 
il)*(suo), showing that the scaling operation can be completely transferred to the 
Fourier domain. The above expression makes it clear that the wavelet transform 
at each scale s can be viewed as a Fourier multiplier operator, determined by 
filters that are derived from scaling a single filter ^*(uj). This can be understood 
as a bandpass filter, as ^(0) = for admissible wavelets. Expression (9) is the 
analogue that we will use to later define the Spectral Graph Wavelet Transform. 

Translation of the wavelets may be defined through "localizing" the wavelet 
operator by applying it to an impulse. Writing 5 a (x) = S(x — a), one has 

(T*5 a )(x) = V (10) 

For real valued and even wavelets this reduces to (T s 5 a )(x) = ^ a ,s{x)- 
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3 Weighted Graphs and Spectral Graph Theory 



The previous section showed that the classical wavelet transform could be de- 
fined without the need to express scaling in the original signal domain. This 
relied on expressing the wavelet operator in the Fourier domain. Our approach 
to defining wavelets on graphs relies on generalizing this to graphs; doing so re- 
quires the analogue of the Fourier transform for signals defined on the vertices 
of a weighted graph. This tool is provided by Spectral Graph Theory. In this 
section we fix our notation for weighted graphs, and motivate and define the 
Graph Fourier transform. 

3.1 Notation for Weighted Graphs 

A weighted graph G = {E, V, w} consists of a set of vertices V, a set of edges 
E, and a weight function w : E — > R + which assigns a positive weight to each 
edge. We consider here only finite graphs where \V\ = N < oo. The adjacency 
matrix A for a weighted graph G is the N x N matrix with entries a mjn where 



In the present work we consider only undirected graphs, which correspond to 
symmetric adjacency matrices. We do not consider the possibility of negative 
weights. 

A graph is said to have loops if it contain edges that connect a single vertex 
to itself. Loops imply the presence of nonzero diagonal entries in the adjacency 
matrix. As the existence of loops presents no significant problems for the theory 
we describe in this paper, we do not specifically disallow them. 

For a weighted graph, the degree of each vertex m, written as d(m), is 
defined as the sum of the weights of all the edges incident to it. This implies 
d{m) = V 

tt m,n- We define the matrix D to have diagonal elements equal to 
the degrees, and zeros elsewhere. 

Every real valued function / : V — > R on the vertices of the graph G can 
be viewed as a vector in R^, where the value of / on each vertex defines each 
coordinate. This implies an implicit numbering of the vertices. We adopt this 
identification, and will write / £ R N for functions on the vertices of the graph, 
and f(m) for the value on the m th vertex. 

Of key importance for our theory is the graph Laplacian operator C. The 
non-normalized Laplacian is defined as C = D — A. It can be verified that for 
any / £ R^, C satisfies 



where the sum over m ~ n indicates summation over all vertices n that are 
connected to the vertex m, and w m ^ n denotes the weight of the edge connecting 
m and n. 



a. 



"m,n 



w(e) if e £ E connects vertices m and n 
otherwise 



(ii) 




(12) 
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For a graph arising from a regular mesh, the graph Laplacian corresponds to 
the standard stencil approximation of the continuous Laplacian (with a differ- 
ence in sign). Consider the graph defined by taking vertices as points on 
a regular two dimensional grid, with each point connected to its four neighbors 
with weight l/(5x) 2 , where Sx is the distance between adjacent grid points. 
Abusing the index notation, for a function / = / m , n defined on the vertices, 
applying the graph Laplacian to / yields 

(^«/)m,n (^fm,n «/m+l,n fm— l,n «/m,n+l 

-fm,n-l)/(Sx) 2 (13) 

which is the standard 5-point stencil for approximating —V 2 /. 

Some authors define and use an alternative, normalized form of the Lapla- 
cian, defined as 

c norm = jj-1/2 ££)-l/2 = j_ ^-1/2^-1/2 ^ 

It should be noted that C and C norm are not similar matrices, in particular 
their eigenvectors are different. As we shall see in detail later, both operators 
may be used to define spectral graph wavelet transforms, however the resulting 
transforms will not be equivalent. Unless noted otherwise we will use the non- 
normalized form of the Laplacian, however much of the theory presented in 
this paper is identical for either choice. We consider that the selection of the 
appropriate Laplacian for a particular problem should depend on the application 
at hand. 

For completeness, we note the following. The graph Laplacian can be de- 
fined for graphs arising from sampling points on a differentiable manifold. The 
regular mesh example described previously is a simple example of such a sam- 
pling process. With increasing sampling density, by choosing the weights appro- 
priately the normalized graph Laplacian operator will converge to the intrinsic 
Laplace-Beltrami operator defined for differentiable real valued functions on the 
manifold. Several authors have studied this limiting process in detail, notably 
[35, 36, 37]. 

3.2 Graph Fourier Transform 

On the real line, the complex exponentials e lUJX defining the Fourier transform 
are eigenfunctions of the one-dimensional Laplacian operator The inverse 
Fourier transform 

f(x) = ^J fMe^dw (15) 

can thus be seen as the expansion of / in terms of the eigenfunctions of the 
Laplacian operator. 

The graph Fourier transform is defined in precise analogy to the previous 
statement. As the graph Laplacian £ is a real symmetric matrix, it has a com- 
plete set of orthonormal eigenvectors. We denote these by \i f° r ^ = 0, . . . , TV— 1, 
with associated eigenvalues \n 

£>X£ = ^tXt (16) 
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As C is symmetric, each of the A^ are real. For the graph Laplacian, it can 
be shown that the eigenvalues are all non-negative, and that appears as an 
eigenvalue with multiplicity equal to the number of connected components of 
the graph [20]. Henceforth, we assume the graph G to be connected, we may 
thus order the eigenvalues such that 

= A <Ai<A 2 ... <Aat_i (17) 

For any function / G R N defined on the vertices of G, its graph Fourier 
transform / is defined by 

N 

f(£) = (Xe,f) = J2^(n)f(n) (18) 

n=l 

The inverse transform reads as 

N-l 

f(n) = /(%(") (19) 

£=0 

The Parseval relation holds for the graph Fourier transform, in particular 
for any f,heR N 

(f,h) = (f,g). (20) 

4 Spectral Graph Wavelet Transform 

Having defined the analogue of the Fourier transform for functions defined on 
the vertices of weighted graphs, we are now ready to define the spectral graph 
wavelet transform (SGWT). The transform will be determined by the choice of 
a kernel function g : R + — > R + , which is analogous to Fourier domain wavelet 
-0* in equation 9. This kernel g should behave as a band-pass filter, i.e. it 
satisfies g(0) = and lim x ^ 00< g(x) = 0. We will defer the exact specification of 
the kernel g that we use until later. 

4.1 Wavelets 

The spectral graph wavelet transform is generated by wavelet operators that 
are operator-valued functions of the Laplacian. One may define a measurable 
function of a bounded self-adjoint linear operator on a Hilbert space using the 
continuous functional calculus [38]. This is achieved using the Spectral repre- 
sentation of the operator, which in our setting is equivalent to the graph Fourier 
transform defined in the previous section. In particular, for our spectral graph 
wavelet kernel g, the wavelet operator T g = g(C) acts on a given function / by 
modulating each Fourier mode as 

W) = g(\i)f(t) (2i) 
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Employing the inverse Fourier transform yields 

N-l 

(T 9 f)(m) = 9(^)f(£)xe(m) (22) 

£=0 

The wavelet operators at scale t is then defined by T l = g(tC). It should 
be emphasized that even though the "spatial domain" for the graph is discrete, 
the domain of the kernel g is continuous and thus the scaling may be defined 
for any positive real number t. 

The spectral graph wavelets are then realized through localizing these oper- 
ators by applying them to the impulse on a single vertex, i.e. 

= Tl5 n (23) 

Expanding this explicitly in the graph domain shows 

N-l 

= Y,^)xl(n) X £(m) (24) 

£=0 

Formally, the wavelet coefficients of a given function / are produced by 
taking the inner product with these wavelets, as 

W f (t,n) = (i H , n ,f) (25) 

Using the orthonormality of the {xi}, it can be seen that the wavelet coefficients 
can also be achieved directly from the wavelet operators, as 

N-l 

W f (t, n) = (T* g f) (n) = 9(t\i)fWxi(n) (26) 

£=0 

4.2 Scaling functions 

By construction, the spectral graph wavelets i/j t , n are all orthogonal to the null 
eigenvector xo> and nearly orthogonal to \i f° r near zero. In order to stably 
represent the low frequency content of / defined on the vertices of the graph, it 
is convenient to introduce a second class of waveforms, analogous to the lowpass 
residual scaling functions from classical wavelet analysis. These spectral graph 
scaling functions have an analogous construction to the spectral graph wavelets. 
They will be determined by a single real valued function h : R + — > R, which 
acts as a lowpass filter, and satisfies h(0) > and h(x) — > as x — > oo. The 
scaling functions are then given by (j) n = ThS n = h(C)S n , and the coefficients by 
S f (n) = (cf> n J). 

Introducing the scaling functions helps ensure stable recovery of the original 
signal / from the wavelet coefficients when the scale parameter t is sampled at 
a discrete number of values tj. As we shall see in detail in Section 5.3, stable 
recovery will be assured if the quantity G(X) = h(X) 2 + J2j=i di^j^) 2 1S bounded 
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Figure 1: Scaling function h(X) (blue curve), wavelet generating kernels g(tj\), 
and sum of squares G (black curve), for J = 5 scales, \ m ax = 10, K = 20. 
Details in Section 8.1. 

away from zero on the spectrum of C. Representative choices for h and g are 
shown in figure 1; the exact specification of h and g is deferred to Section 8.1. 

Note that the scaling functions defined in this way are present merely to 
smoothly represent the low frequency content on the graph. They do not gener- 
ate the wavelets ip through the two-scale relation as for traditional orthogonal 
wavelets. The design of the scaling function generator h is thus uncoupled from 
the choice of wavelet kernel g, provided reasonable tiling for G is achieved. 

5 Transform properties 

In this section we detail several properties of the spectral graph wavelet trans- 
form. We first show an inverse formula for the transform analogous to that for 
the continuous wavelet transform. We examine the small-scale and large-scale 
limits, and show that the wavelets are localized in the limit of small scales. Fi- 
nally we discuss discretization of the scale parameter and the resulting wavelet 
frames. 

5.1 Continuous SGWT Inverse 

In order for a particular transform to be useful for signal processing, and not 
simply signal analysis, it must be possible to reconstruct from a given set of 
transform coefficients. We will show that the spectral graph wavelet transform 
admits an inverse formula analogous to (4) for the continuous wavelet transform. 

Intuitively, the wavelet coefficient W/(t, n) provides a measure of "how much 
of" the wavelet i/j t ,n is present in the signal /. This suggests that the original 
signal may be recovered by summing the wavelets ipt,n multiplied by each wavelet 
coefficient W/(t, n). The reconstruction formula below shows that this is indeed 
the case, subject to a non-constant weight dt/t. 

Lemma 5.1. If the SGWT kernel g satisfies the admissibility condition 




(27) 
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and g(0) = 0, then 



W f {t,n)^ n {m)^=f*{m) (28) 



1 f c 

°^ n=l J ° 



where f# = f — (xo?/)Xo- ^ n particular, the complete reconstruction is then 
given by f = f* + /(0)xo- 

Proof. Using (24) and (26) to express ?/^ n and W/(t,n) in the graph Fourier 
basis, the l.h.s. of the above becomes 

^ ^ ^ E X>(^)*£' Wxr(m)J (29) 

= 7T Tl E^ A '')^)/W»NE^W»W ) * (30) 

The orthonormality of the %^ implies J2 n Xe ( n )Xf.( n ) = inserting this 

above and summing over £' gives 



y s , wo 



5 (tA<) -dt ) f(i)xt(m) (31) 



If g satisfies the admissibility condition, then the substitution u = t\g shows 
that J 9 dt = C p independent of £, except for when = at i = when the 
integral is zero. The expression (31) can be seen as the inverse Fourier transform 
evaluated at vertex m, where the i = term is omitted. This omitted term is 
exactly equal to (xo? /) Xo = / (0)xo> which proves the desired result. □ 

Note that for the non-normalized Laplacian, xo is constant on every vertex 
and above corresponds to removing the mean of /. Formula (28) shows that 
the mean of / may not be recovered from the zero- mean wavelets. The situation 
is different from the analogous reconstruction formula (4) for the CWT, which 
shows the somewhat counterintuitive result that it is possible to recover a non 
zero- mean function by summing zero-mean wavelets. This is possible on the real 
line as the Fourier frequencies are continuous; the fact that it is not possible 
for the SGWT should be considered a consequence of the discrete nature of the 
graph domain. 

While it is of theoretical interest, we note that this continuous scale recon- 
struction formula may not provide a practical reconstruction in the case when 
the wavelet coefficients may only be computed at a discrete number of scales, 
as is the case for finite computation on a digital computer. We shall revisit this 
and discuss other reconstruction methods in sections 5.3 and 7. 
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5.2 Localization in small scale limit 

One of the primary motivations for the use of wavelets is that they provide 
simultaneous localization in both frequency and time (or space). It is clear by 
construction that if the kernel g is localized in the spectral domain, as is loosely 
implied by our use of the term bandpass filter to describe it, then the associated 
spectral graph wavelets will all be localized in frequency. In order to be able to 
claim that the spectral graph wavelets can yield localization in both frequency 
and space, however, we must analyze their behaviour in the space domain more 
carefully. 

For the classical wavelets on the real line, the space localization is readily 
apparent : if the mother wavelet ip(x) is well localized in the interval [— e, e], 
then the wavelet ip t ^ a (x) will be well localized within [a — et, a+et]. In particular, 
in the limit as t — > 0, ip t ,a(x) —> for x ^ a. The situation for the spectral 
graph wavelets is less straightforward to analyze because the scaling is defined 
implicitly in the Fourier domain. We will nonetheless show that, for g sufficiently 
regular near 0, the normalized spectral graph wavelet ipt,j/ \\ipt,j II will vanish 
on vertices sufficiently far from j in the limit of fine scales, i.e. as t — > 0. This 
result will provide a quantitative statement of the localization properties of the 
spectral graph wavelets. 

One simple notion of localization for xp t ^ n is given by its value on a distant 
vertex m, e.g. we should expect ^t,n(m) to be small if n and m are separated, 
and t is small. Note that ipt,n(m) — (ipt,n, S m ) = {Tg5 n , S m ). The operator T l g = 
g(tC) is self-adjoint as C is self adjoint. This shows that ipt,n(rn) = (^nj^m) 5 
i.e. a matrix element of the operator T l g . 

Our approach is based on approximating g(tC) by a low order polynomial in 
C as t — > 0. As is readily apparent by inspecting equation (22), the operator T l 
depends only on the values of #t(A) restricted to the spectrum {A^j^J^ 1 of C. In 
particular, it is insensitive to the values of #t(A) for A > Ajv-i- If #(A) is smooth 
in a neighborhood of the origin, then as t approaches the zoomed in #t(A) can 
be approximated over the entire interval [0, Ajv-i] by the Taylor polynomial of 
g at the origin. In order to transfer the study of the localization property from 
g to an approximating polynomial, we will need to examine the stability of the 
wavelets under perturbations of the generating kernel. This, together with the 
Taylor approximation will allow us to examine the localization properties for 
integer powers of the Laplacian C. 

In order to formulate the desired localization result, we must specify a no- 
tion of distance between points m and n on a weighted graph. We will use 
the shortest-path distance, i.e. the minimum number of edges for any paths 
connecting m and n : 

dcim^n) = argmin{fci, & 2 , k s } (32) 

s 

s.t. m = fci, n = k s , and Wk r ,k r+1 > for 1 < r < s. (33) 

Note that as we have defined it, do disregards the values of the edge weights. In 
particular it defines the same distance function on G as on the binarized graph 
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where all of the nonzero edge weights are set to unit weight. 

We need the following elegant elementary result from graph theory [39]. 

Lemma 5.2. Let G be a weighted graph, with adjacency matrix A. Let B equal 
the adjacency matrix of the binarized graph, i.e. B m ^ n = if A m ^ n = ; and 
Bm, n — 1 if A m ^ n > 0. Let B be the adjacency matrix with unit loops added on 
every vertex, e.g. B m ^ n = B m ^ n for m ^ n and B m ^ n = 1 for m = n. 

Then for each s > 0, (£? s ) m , n equals the number of paths of length s con- 
necting m and n, and (B s ) m ^ n equals the number of all paths of length r < s 
connecting m and n. 

We wish to use this to show that matrix elements of low powers of the graph 
Laplacian corresponding to sufficiently separated vertices must be zero. We first 
need the following 

Lemma 5.3. Let A = (a m ^ n ) be an M x M matrix, and B = (6 m , n ) an M x M 
matrix with non-negative entries, such that £> m , n = implies A m ^ n = 0. Then, 
for all s > 0, and all m,n, (B s ) m ^ n = implies that (A s ) m ^ n = 

Proof. By repeatedly writing matrix multiplication as explicit sums, one has 



where the sum is taken over all s — 1 length sequences fci, k2...k s -i with 1 < 
k r < M. Fix 5, m and n. If (B s ) 

m.n — 0? then as every element of B is 
non-negative, every term in the above sum must be zero. This implies that for 
each term, at least one of the b q ^ r factors must be zero. This implies by the 
hypothesis that the corresponding a q ^ r factor in the corresponding term in the 
sum for (A s ) m ^ n must be zero. As this holds for every term in the above sums, 
we have (A s ) m ^ n = □ 

We now state the localization result for integer powers of the Laplacian. 

Lemma 5.4. Let G be a weighted graph, C the graph Laplacian (normalized 
or non-normalized) and s > an integer. For any two vertices m and n, if 
dcim^n) > s then (C s ) m ^ n — 0. 

Proof. Set the {0, 1} valued matrix B such that B q ^ r = if C q ^ r = 0, otherwise 
B q r = 1. The B such defined will be the same whether C is normalized or 
non- normalized. B is equal to the adjacency matrix of the binarized graph, but 
with l's on the diagonal. According to Lemma 5.2, (B s ) m ^ n equals the number 
of paths of length s or less from m to n. As dc(m,n) > s there are no such 
paths, so (B s ) rn ^ n = 0. But then Lemma 5.3 shows that (£ s ) m , n = 0. □ 

We now proceed to examining how perturbations in the kernel g affect the 
wavelets in the vertex domain. If two kernels g and g' are close to each other 
in some sense, then the resulting wavelets should be close to each other. More 
precisely, we have 




(34) 




(35) 
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Lemma 5.5. Let i/j t , n — TgS n and ip^ n = T l g , be the wavelets at scale t generated 
by the kernels g and g' . If \g(t\) — g'(t\)\ < M(t) for all X G [0, Ajv-i], then 
\ipt,n(m) — ^t, n ( m )l ^ M(t) for each vertex m. Additionally, \\^t,n — V^nl^ — 
VNM(t). 

Proof First recall that ipt,n(m) = (dm, g(t£)5 n ). Thus, 



|^, n (m) - $, n (m)| = | (5 m , (g(t£) - g\tC)) S n ) \ 

= Y,X£(m)(g(t\ £ ) - g f (t\ £ ))xKn) 

t 

<M(i)Y,\Xi{m)xt(n)*\ 



(36) 



where we have used the Parseval relation (20) on the second line. By Cauchy- 
Schwartz, the above sum over i is bounded by 1 as 



1/2 



1/2 



\Xi(m)xl(n)\ < E \Xi(m)\" £ | x J(n) 



(37) 



and \Xi( m )\ 2 = 1 f° r a U m i as the xt form a complete orthonormal basis. 
Using this bound in (36) proves the first statement. 
The second statement follows immediately as 

Ikn - ttJlt = E ^(m) - ^, n (m)) 2 < ^M(t) 2 = NM ^ 2 ( 38 ) 



□ 

We will prove the final localization result for kernels g which have a zero of 
integer multiplicity at the origin. Such kernels can be approximated by a single 
monomial for small scales. 

Lemma 5.6. Let g be K + l times continuously differentiable, satisfying g(0) = 
0; # (r) (°) = for all r < K, and g {K) (0) = C ^ 0. Assume that there is 
some t' > such that \g {K+1) (X)\ < B for all X e [0,£%r-i]. Then, for 
g'(tX) = (C/K\)(tX) K we have 



M(t) 



ag[o,a 



X K+1 

sup \g(t\)-g\t\)\<t K ^-^-B 



(39) 



for all t <t' . 



Proof. As the first K — 1 derivatives of g are zero, Tayior's formuia with re- 
mainder shows, for any values of t and A, 



g(t\) = c 



+ 9 y 



(K + iy. 



(40) 
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for some x* G [0, t\]. Nowfixt<t / . For any A G [0, Ajv-i], we have tX < t'Ajv-i, 
and so the corresponding G [0, t'Ajv-i], and so < 5. This implies 

\ s m- a 'mi<B t w ^ w <B 1 ^i (4i) 

As this holds for all A G [0, Ajv-i], taking the sup over A gives the desired 
result. □ 

We are now ready to state the complete localization result. Note that due 
to the normalization chosen for the wavelets, in general ^t,n( m ) ~^ as t — > 
for all m and n. Thus a non vacuous statement of localization must include a 
renormalization factor in the limit of small scales. 

Theorem 5.7. Let G be a weighted graph with Laplacian C. Let g be a kernel 
satisfying the hypothesis of Lemma 5.6, with constants t' and B. Let m and n 
be vertices of G such that dcim^n) > K. Then there exist constants D and t" , 
such that 



\m,n\ 

for allt < mm(t',t"). 



< Dt (42) 



Proof Set g'(X) = ^^-X K and = T*,<f n . We have 

iH = -^f 1 ^ <*m, C K S n ) = (43) 

by Lemma 5.4, as dc(m^n) > K. By the results of Lemmas 5.5 and 5.6, we 
have 

|V>t,„(m) - $, n (m)| = |Vt,n(m)| < t K+1 C (44) 

X K+1 

for C' = (x+iy m B. Writing ip t , n = + (^t, n - ^t, n ) and applying the triangle 
inequality shows 

IKnH-IKn-^nll^ll^ll ( 45 ) 

We may directly calculate 1 1 ip f t 1 1 = t K 9 1 1 C K 5 n 1 1 , and we have 1 1 ip t ^ n — ijj' t 1 1 < 
, \ K+1 

Nt K+1 (K+ly B from Lemma 5.6. These imply together that the l.h.s. of (45) 



,K + 1 



is greater than or equal to t K y 9 - K ^ | \C K 5 n 1 1 — t\fN ^f^y Bj . Together with 
(44), this shows 

||^,n|| ~ a-tb y J 

with a = 9 K \°^ \ \C K 5 n \\ and b = \/N ^~-^y B. An elementary calculation 
shows -^tt < —t if t < Tpr. This implies the desired result with D = 

a— to — a —20 1 

2CK\ , w, _ g (K) (Q)||^-?n||(g+i) 

gW(0)||£*<5„|| anCl 1 - 2vTVA^;B ' U 



17 



5.3 Spectral Graph Wavelet Frames 

The spectral graph wavelets depend on the continuous scale parameter t. For 
any practical computation, t must be sampled to a finite number of scales. 
Choosing J scales {tj}j =1 will yield a collection of NJ wavelets iptj,n, along 
with the N scaling functions (j> n . 

It is a natural question to ask how well behaved this set of vectors will be 
for representing functions on the vertices of the graph. We will address this 
by considering the wavelets at discretized scales as a frame, and examining the 
resulting frame bounds. 

We will review the basic definition of a frame. A more complete discussion 
of frame theory may be found in [40] and [41]. Given a Hilbert space H, a set 
of vectors eH form a frame with frame bounds A and B if the inequality 

A||/|| 2 <^|(/,r fc )| 2 <B||/|| 2 (47) 

k 

holds for all f eH. 

The frame bounds A and B provide information about the numerical stability 
of recovering the vector / from inner product measurements (/, T^). These 
correspond to the scaling function coefficients Sf(n) and wavelet coefficients 
Wf (tj , n) for the frame consisting of the scaling functions and the spectral graph 
wavelets with sampled scales. As we shall see later in section 7, the speed of 
convergence of algorithms used to invert the spectral graph wavelet transform 
will depend on the frame bounds. 

Theorem 5.8. Given a set of scales {tj}j =1 , the set F = {0 n }^L 1 U{V ; t j5 n}/ = i n=i 
forms a frame with bounds A, B given by 

A= min G(X) (48) 

Ag[0,Aat-i] 

B = max G(X), 

Ag[0,Aat_i] 



where G(X) = h 2 (X) + E^M) 2 - 
Proof Fix /. Using expression (26), we see 



Eiw/(^)! 2 = EE«( a «)»(«)/wEs(^)»(»)/(<') (49) 

n n £ t> 

= 5>(u,)| 2 |/WI 2 



upon rearrangement and using ^ n Xl( n )Xl'( n ) — $£,£'• Similarly, 

E i^/Mi 2 = Eiwi 2 i/wi 2 ( 5 °) 
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Denote by Q the sum of squares of inner products of / with vectors in the 
collection F. Using (49) and (50), we have 

Q = W IMA.)| 2 + E \g(t 3 \t)\ 2 J \f(e)\ 2 = G(\t)\f(\t)\ 2 (51) 

Then by the definition of A and £>, we have 

Aj^\f(£)\ 2 <Q<Bj^\f(£)\ 2 (52) 
Using the Parseval relation ||/|| 2 = J2i I /CO I 2 then gi yes the desired result. □ 



6 Polynomial Approximation and Fast SGWT 

We have defined the SGWT explicitly in the space of eigenfunctions of the graph 
Laplacian. The naive way of computing the transform, by directly using equa- 
tion (26), requires explicit computation of the entire set of eigenvectors and 
eigenvalues of C. This approach scales poorly for large graphs. General purpose 
eigenvalue routines such as the QR algorithm have computational complexity 
of 0(N 3 ) and require 0(N 2 ) memory [42]. Direct computation of the SGWT 
through diagonalizing C is feasible only for graphs with fewer than a few thou- 
sand vertices. In contrast, problems in signal and image processing routinely 
involve data with hundreds of thousands or millions of dimensions. Clearly, a 
fast transform that avoids the need for computing the complete spectrum of C is 
needed for the SGWT to be a useful tool for practical computational problems. 

We present a fast algorithm for computing the SGWT that is based on 
approximating the scaled generating kernels g by low order polynomials. Given 
this approximation, the wavelet coefficients at each scale can then be computed 
as a polynomial of C applied to the input data. These can be calculated in a 
way that accesses C only through repeated matrix-vector multiplication. This 
results in an efficient algorithm in the important case when the graph is sparse, 
i.e. contains a small number of edges. 

We first show that the polynomial approximation may be taken over a finite 
range containing the spectrum of C. 

Lemma 6.1. Let X m ax > Ajv-i be any upper bound on the spectrum of C. 
For fixed t > 0, let p(x) be a polynomial approximant of g(tx) with error 
B = sup a , G [o,A ma;r ] \d(t x ) ~ P( x )\- Then the approximate wavelet coefficients 
W f (t,n) = (p(E)f) n satisfy 

\W f (t,n)-W f (t,n)\<B\\f\\ (53) 
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Figure 2: (a) Wavelet kernel g(X) (black), truncated Chebyshev approxima- 
tion (blue) and minimax polynomial approximation (red) for degree m = 20. 
Approximation errors shown in (b), truncated Chebyshev polynomial has max- 
imum error 0.206, minimax polynomial has maximum error 0.107 . 



Proof. Using equation (26) we have 



\W f (t,n)-W f (t,n)\ 



<J2\9(t\e)-p(\e)\\f(£)xdn)\ 
i 

<B\ 



(54) 
(55) 



The last step follows from using Cauchy- Schwartz and the orthonormality of 
the x/s. □ 

Remark : The results of the lemma hold for any X max > Ajv-i- Com- 
puting extremal eigenvalues of a self- adjoint operator is a well studied problem, 
and efficient algorithms exist that access C only through matrix-vector multi- 
plication, notably Arnoldi iteration or the Jacobi-Davidson method [42, 43]. In 
particular, good estimates for Ajv-i may be computed at far smaller cost than 
that of computing the entire spectrum of C. 

For fixed polynomial degree M, the upper bound on the approximation error 
from Lemma 6.1 will be minimized if p is the minimax polynomial of degree M 
on the interval [0, A ma J. Minimax polynomial approximations are well known, 
in particular it has been shown that they exist and are unique [44]. Several 
algorithms exist for computing minimax polynomials, most notably the Remez 
exchange algorithm [45]. 

In this work, however, we will instead use a polynomial approximation given 
by the truncated Chebyshev polynomial expansion of g(tx). It has been shown 
that for analytic functions in an ellipse containing the approximation interval, 
the truncated Chebyshev expansions gives an approximate minimax polynomial 
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[46] . Minimax polynomials of order m are distinguished by having their approx- 
imation error reach the same extremal value at m + 2 points in their domain. 
As such, they distribute their approximation error across the entire interval. 
We have observed that for the wavelet kernels we use in this work, truncated 
Chebyshev polynomials result in a minimax error only slightly higher than the 
true minimax polynomials, and have a much lower approximation error where 
the wavelet kernel to be approximated is smoothly varying. A representative 
example of this is shown in Figure 2. We have observed that for small weighted 
graphs where the wavelets may be computed directly in the spectral domain, 
the truncated Chebyshev approximations give slightly lower approximation er- 
ror than the minimax polynomial approximations computed with the Remez 
algorithm. 

For these reasons, we use approximating polynomials given by truncated 
Chebyshev polynomials. In addition, we will exploit the recurrence properties 
of the Chebyshev polynomials for efficient evaluation of the approximate wavelet 
coefficients. An overview of Chebyshev polynomial approximation may be found 
in [47], we recall here briefly a few of their key properties. 

The Chebyshev polynomials T k (y) may be generated by the stable recurrence 
relation T k {y) = 2yT k _ 1 (y) - T k _ 2 (y), with T = 1 and T x = y. For y G [-1, 1], 
they satisfy the trigonometric expression T k (y) = cos (k arccos (?/)), which shows 
that each T k (y) is bounded between -1 and 1 for y G [—1,1]. The Chebyshev 
polynomials form an orthogonal basis for L 2 ([— 1, 1], . dy ), the Hilbert space 

of square integrable functions with respect to the measure dy/^/l — y 2 . In 
particular they satisfy 



1 T f (y)r m (y) ^ = U >m 7r/2 if m, I > 

V /T~^2 V \tt if m = Z = [ ) 



Every h G L 2 ([— 1, 1], — ) has a uniformly convergent Chebyshev series 



^ CO 

%) = 2 C o + Z c ^ T ^) ( 57 ) 

k=l 

with Chebyshev coefficients 

c k = - f 1 Tk ^ h ^ dy = - [* cQ8(kO)h(cQs(0))<W (58) 
^i-i V 1 "!/ 2 * Jo 

We now assume a fixed set of wavelet scales t n . For each n, approximating g(t n x) 
for x G [0, Xmax] can be done by shifting the domain using the transformation 
x = a(y + 1), with a = \ max /2. Denote the shifted Chebyshev polynomials 
T k {x) = T k {^). We may then write 

9(tnX) = 2 C n,0 + C n,kT k (x), (59) 
k=l 
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valid for x G [0, A ma J, with 



c n ,k = ~ I cos(kO)g(t n (a(cos(0) + l)))d0. (60) 



7T 







For each scale tj , the approximating polynomial pj is achieved by truncating 
the Chebyshev expansion (59) to Mj terms. We may use exactly the same 
scheme to approximate the scaling function kernel h by the polynomial po . 

Selection of the values of Mj may be considered a design problem, posing a 
trade-off between accuracy and computational cost. The fast SGWT approxi- 
mate wavelet and scaling function coefficients are then given by 

Wf(tj,n) = ^c jfi f + £ Cj , k T k {C)f j 

/, Mo _ \ 

Sf(n)= [-c , f + J2^,kT k (C)f\ 

\ k=l ) n 

(61) 



The utility of this approach relies on the efficient computation of Tk(C)f. 
Crucially, we may use the Chebyshev recurrence to compute this for each k < Mj 
accessing C only through matrix-vector multiplication. As the shifted Cheby- 
shev polynomials satisfy Tk(x) = -{x — l)Tk-i(x) — Tk-2(%), we have for any 

feR N , 

T k (C)f=l(C-I)(T k -i(C)f) -T k - 2 (C)f (62) 

Treating each vector T^{C)f as a single symbol, this relation shows that the 
vector Tk{C)f can be computed from the vectors Tk-i(C)f and T/ c _ 2 (£)/ with 
computational cost dominated by a single matrix- vector multiplication by C. 

Many weighted graphs of interest are sparse, i.e. they have a small number 
of nonzero edges. Using a sparse matrix representation, the computational cost 
of applying £ to a vector is proportional to \E\, the number of nonzero edges 
in the graph. The computational complexity of computing all of the Chebyshev 
polynomials Tk(C)f for k < M is thus 0(M\E\). The scaling function and 
wavelet coefficients at different scales are formed from the same set of Tk(C)f, 
but by combining them with different coefficients Cj^. The computation of the 
Chebyshev polynomials thus need not be repeated, instead the coefficients for 
each scale may be computed by accumulating each term of the form Cj^Tk(C)f 
as Tk(C)f is computed for each k < M. This requires O(N) operations at scale j 
for each k < Mj, giving an overall computational complexity for the fast SGWT 
of 0(M\E\+N ^2j=o ^(7)5 wnere J is the number of wavelet scales. In particular, 
for classes of graphs where \E\ scales linearly with TV, such as graphs of bounded 
maximal degree, the fast SGWT has computational complexity O(N). Note that 
if the complexity is dominated by the computation of the Tk(C)f, there is little 
benefit to choosing Mj to vary with j. 
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Applying the recurrence (62) requires memory of size 37V. The total memory 
requirement for a straightforward implementation of the fast SGWT would then 
be 7V(J + 1) + 37V. 

6.1 Fast computation of Adjoint 

Given a fixed set of wavelet scales {tj}j =1 , and including the scaling functions 
<fr n , one may consider the overall wavelet transform as a linear map W : R N — > 
R N(j+i) defined by wf = ((T h f) T ,(T%if) T r .. ,(T*'f) T ) T . Let W be the 
corresponding approximate wavelet transform defined by using the fast SGWT 
approximation, i.e. Wf = ((p (C)f) T , (pi(£)f) T , • • • ,(pj(C)f) T ) . We show 
that both the adjoint W* : R N ( J +V R N , and the composition W* W : R N 
R N can be computed efficiently using Chebyshev polynomial approximation. 
This is important as several methods for inverting the wavelet transform or 
using the spectral graph wavelets for regularization can be formulated using the 
adjoint operator, as we shall see in detail later in Section 7. 

For any r] G R Ar ^ J+1 ^ ) , we consider r] as the concatenation rj = (77^, rj^ , • • • , r]j) T 
with each rjj G R N for < j < J. Each rjj for j > 1 may be thought of as a 
subband corresponding to the scale t j , with 770 representing the scaling function 
coefficients. We then have 

j 

(V, Wf) N{J+1) = fa,, T h f) + J2 fa ' /)n 
= &o,/) + (E( T ^)*%J 

as T/j, and each Tg J are self adjoint. As (63) holds for all / G M. N , it follows that 
W*r] = T^rjo + J2j =1 Tg J rj n , i.e. the adjoint is given by re-applying the corre- 
sponding wavelet or scaling function operator on each subband, and summing 
over all scales. 

This can be computed using the same fast Chebyshev polynomial approxi- 
mation scheme in equation (61) as for the forward transform, e.g. as W*rj = 
^j = oPj(£)Vj' Note that this scheme computes the exact adjoint of the approx- 
imate forward transform, as may be verified by replacing by po(C) and Tg 3 
by pj(C) in (63). 

We may also develop a polynomial scheme for computing Naively 
computing this by first applying W, then W* by the fast SGWT would in- 
volve computing 2J Chebyshev polynomial expansions. By precomputing the 
addition of squares of the approximating polynomials, this may be reduced to 
application of a single Chebyshev polynomial with twice the degree, reducing 



T h T]o 



.7 = 1 



1 N 

(63) 
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the computational cost by a factor J. Note first that 



W*Wf = ^ Pj (jC) (pj(C)f) = ^2( Pj (C)f ) f (64) 

3=0 \j=0 

Set P(x) = T, J j=o(Pj( x )) 2 ' which has degree M* = 2max{M J }. We seek to 
express P in the shifted Chebyshev basis as P(x) = ^do + Y^k^i dkTk(x). The 
Chebyshev polynomials satisfy the product formula 

T k {x)T t (x) = X - (T k+l (x) +T\ k _n(x)) (65) 

which we will use to compute the Chebyshev coefficients d k in terms of the 
Chebyshev coefficients c^ k for the individual p/s. 

Expressing this explicitly is slightly complicated by the convention that the 
fc = Chebyshev coefficient is divided by 2 in the Chebyshev expansion (59). 
For convenience in the following, set c'- k = c^ k for fc > 1 and Cj = \cj^, 

so that Pj (x) = J2^=o c j,k T k(x). Writing (pj(x)) 2 = Ylt*J£ n d ' jjk T k (x), and 
applying (65), we compute 

lko 2 + E,=o^ 2 ) if* = o 

dj,k = \ \ fei=0 C j,i c/ j,k-i + Si=0 C j,i c/ j,k+i + Si-fe C j,i C j,i-/e) if < fc < M, 

kl^fJk-M^jAk-i) HM j <k<2M j 

(66) 

Finally, setting d n ^ = 2d^ and d^ k = d^ fc for fc > 1, and setting d k — 
J2j=odj,k gives the Chebyshev coefficients for P(x). We may then compute 

1 M * 

W*Wf = P(C)f = -d / + E d * T *(£)/ ( 67 ) 



/c=l 



following (61). 



7 Reconstruction 

For most interesting signal processing applications, merely calculating the wavelet 
coefficients is not sufficient. A wide class of signal processing applications are 
based on manipulating the coefficients of a signal in a certain transform, and 
later inverting the transform. For the SGWT to be useful for more than simply 
signal analysis, it is important to be able to recover a signal corresponding to a 
given set of coefficients. 

The SGWT is an overcomplete transform as there are more wavelets iptj,n 
than original vertices of the graph. Including the scaling functions <\> n in the 
wavelet frame, the SGWT maps an input vector / of size N to the N{ J + 1) 
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coefficients c = Wf. As is well known, this means that W will have an infinite 
number of left-inverses M s.t. MWf = f. A natural choice among the possible 
inverses is to use the pseudoinverse L = (W*W)~ X W*. The pseudoinverse 
satisfies the minimum-norm property 

Lc = argmin||c- Wf\\ 2 (68) 

feR N 

For applications which involve manipulation of the wavelet coefficients, it is very 
likely to need to apply the inverse to a a set of coefficients which no longer lie 
directly in the image of W. The above property indicates that, in this case, 
the pseudoinverse corresponds to orthogonal projection onto the image of W, 
followed by inversion on the image of W. 

Given a set of coefficients c, the pseudoinverse will be given by solving the 
square matrix equation (W*W)f = W*c. This system is too large to invert di- 
rectly. Solving it may be performed using any of a number of iterative methods, 
including the classical frame algorithm [40], and the faster conjugate gradients 
method [48]. These methods have the property that each step of the compu- 
tation is dominated by application of to a single vector. We use the 
conjugate gradients method, employing the fast polynomial approximation (67) 
for computing application of W"*W\ 



8 Implementation and examples 

In this section we first give the explicit details of the wavelet and scaling function 
kernels used, and how we select the scales. We then show examples of the 
spectral graph wavelets on several different real and synthetic data sets. 



8.1 SGWT design details 

Our choice for the wavelet generating kernel g is motivated by the desire to 
achieve localization in the limit of fine scales. According to Theorem 5.7, local- 
ization can be ensured if g behaves as a monic power of x near the origin. We 
choose g to be exactly a monic power near the origin, and to have power law 
decay for large x. In between, we set g to be a cubic spline such that g and g' 
are continuous. Our g is parametrized by the integers a and /?, and x\ and x 2 
determining the transition regions : 



g(x',a,/3,x 1 ,x 2 ) 



x^[ a x a for x < x\ 

s(x) for x\ < x < X2 (69) 
x^x~P for x > X2 



Note that g is normalized such that g{x\) = g(x2) = 1. The coefficients of 
the cubic polynomial s(x) are determined by the continuity constraints s(xi) = 
s(x2) = 1 , s'(xi) = a/xi and s'(x2) = —/3/x2. All of the examples in this 
paper were produced using a = (3 = 1, x± = 1 and x 2 = 2; in this case 
s(x) = -5 + Ux - 6x 2 + x 3 . 
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Figure 3: Spectral graph wavelets on Swiss Roll data cloud, with J = 4 wavelet 
scales, (a) vertex at which wavelets are centered (b) scaling function (c)-(f) 
wavelets, scales 1-4. 



The wavelet scales tj are selected to be logarithmically equispaced between 
the minimum and maximum scales tj and t\. These are themselves adapted to 
the upper bound X max of the spectrum of C. The placement of the maximum 
scale t\ as well as the scaling function kernel h will be determined by the selection 
of Xmin — Xmax/K, where K is a design parameter of the transform. We then 
set t\ so that g(t\x) has power-law decay for x > Xmim and set tj so that 
g(tjx) has monic polynomial behaviour for x < Xmax- This is achieved by 
t\ — X2/X min and tj = X2/X max . 

For the scaling function kernel we take h(x) = 7exp( — ( 06 ^ ) 4 ), where 7 
is set such that h(0) has the same value as the maximum value of g. 

This set of scaling function and wavelet generating kernels, for parameters 
Xmax = 10, K = 20, a = (3 = 2, x\ = 1, x<i = 2, and J = 4, are shown in Figure 
1. 
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8.2 Illustrative examples : spectral graph wavelet gallery 



As a first example of building wavelets in a point cloud domain, we consider the 
spectral graph wavelets constructed on the "Swiss roll" . This example data set 
consists of points randomly sampled on a 2-d manifold that is embedded in R 3 . 
The manifold is described parametrically by x(s, t) = (t cos(£)/47r, s, t sin(t)/47r) 
for— l<s<l,7r<t< 4-7T. For our example we take 500 points sampled 
uniformly on the manifold. 

Given a collection X{ of points, we build a weighted graph by setting edge 
weights Wij = exp(— \ \xj — Xj\\ 2 /2a 2 ). For larger data sets this graph could 
be sparsified by thresholding the edge weights, however we do not perform this 
here. In Figure 3 we show the Swiss roll data set, and the spectral graph 
wavelets at four different scales localized at the same location. We used a = 0.1 
for computing the underlying weighted graph, and J = 4 scales with K = 20 for 
computing the spectral graph wavelets. In many examples relevant for machine 
learning, data are given in a high dimensional space that intrinsically lie on 
some underlying lower dimensional manifold. This figure shows how the spectral 
graph wavelets can implicitly adapt to the underlying manifold structure of the 
data, in particular notice that the support of the coarse scale wavelets diffuse 
locally along the manifold and do not "jump" to the upper portion of the roll. 

A second example is provided by a transportation network. In Figure 4 we 
consider a graph describing the road network for Minnesota. In this dataset, 
edges represent major roads and vertices their intersection points, which often 
but not always correspond to towns or cities. For this example the graph is un- 
weighted, i.e. the edge weights are all equal to unity independent of the physical 
length of the road segment represented. In particular, the spatial coordinates 
of each vertex are used only for displaying the graph and the corresponding 
wavelets, but do not affect the edge weights. We show wavelets constructed 
with K = 100 and J = 4 scales. 

Graph wavelets on transportation networks could prove useful for analyzing 
data measured at geographical locations where one would expect the under- 
lying phenomena to be influenced by movement of people or goods along the 
transportation infrastructure. Possible example applications of this type in- 
clude analysis of epidemiological data describing the spread of disease, analysis 
of inventory of trade goods (e.g. gasoline or grain stocks) relevant for logistics 
problems, or analysis of census data describing human migration patterns. 

Another promising potential application of the spectral graph wavelet trans- 
form is for use in data analysis for brain imaging. Many brain imaging modali- 
ties, notably functional MRI, produce static or time series maps of activity on 
the cortical surface. Functional MRI imaging attempts to measure the differ- 
ence between "resting" and "active" cortical states, typically by measuring MRI 
signal correlated with changes in cortical blood flow. Due to both constraints 
on imaging time and the very indirect nature of the measurement, functional 
MRI images typically have a low signal-to-noise ratio. There is thus a need 
for techniques for dealing with high levels of noise in functional MRI images, 
either through direct denoising in the image domain or at the level of statistical 
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(d) (e) (f) 

Figure 4: Spectral graph wavelets on Minnesota road graph, with K = 100, 
J = 4 scales, (a) vertex at which wavelets are centered (b) scaling function 
(c)-(f) wavelets, scales 1-4. 

hypothesis testing for defining active regions. 

Classical wavelet methods have been studied for use in fMRI processing, 
both for denoising in the image domain [49] and for constructing statistical 
hypothesis testing [50, 51]. The power of these methods relies on the assumption 
that the underlying cortical activity signal is spatially localized, and thus can 
be efficiently represented with localized wavelet waveforms. However, such use 
of wavelets ignores the anatomical connectivity of the cortex. 

A common view of the cerebral cortex is that it is organized into distinct 
functional regions which are interconnected by tracts of axonal fibers. Recent 
advances in diffusion MRI imaging, notable diffusion tensor imaging (DTI) and 
diffusion spectrum imaging (DSI), have enabled measuring the directionality 
of fiber tracts in the brain. By tracing the fiber tracts, it is possible to non- 
invasively infer the anatomical connectivity of cortical regions. This raises an 
interesting question of whether knowledge of anatomical connectivity can be 
exploited for processing of image data on the cortical surface. 

We 1 have begun to address this issue by implementing the spectral graph 

-^n collaboration with Dr Leila Cammoun and Prof. Jean-Philippe Thiran, EPFL, Lau- 
sanne, Dr Patric Hagmann and Prof. Reto Meuli, CHUV, Lausanne 
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(d) (e) (f) 

Figure 5: Spectral graph wavelets on cerebral cortex, with K = 50, J = 4 scales, 
(a) ROI at which wavelets are centered (b) scaling function (c)-(f) wavelets, 
scales 1-4. 

wavelets on a weighted graph which captures the connectivity of the cortex. 
Details of measuring the cortical connection matrix are described in [52]. Very 
briefly, the cortical surface is first subdivided into 998 Regions of Interest 
(ROI's). A large number of fiber tracts are traced, then the connectivity of 
each pair of ROI's is proportional to the number of fiber tracts connecting 
them, with a correction term depending on the measured fiber length. The re- 
sulting symmetric matrix can be viewed as a weighted graph where the vertices 
are the ROI's. Figure 5 shows example spectral graph wavelets computed on 
the cortical connection graph, visualized by mapping the ROI's back onto a 3d 
model of the cortex. Only the right hemisphere is shown, although the wavelets 
are defined on both hemispheres. For future work we plan to investigate the 
use of these cortical graph wavelets for use in regularization and denoising of 
functional MRI data. 

A final interesting application for the spectral graph wavelet transform is 
the construction of wavelets on irregularly shaped domains. As a representative 
example, consider that for some problems in physical oceanography one may 
need to manipulate scalar data, such as water temperature or salinity, that 
is only defined on the surface of a given body of water. In order to apply 
wavelet analysis for such data, one must adapt the transform to the potentially 
very complicated boundary between land and water. The spectral wavelets 
handle the boundary implicitly and gracefully. As an illustration we examine 
the spectral graph wavelets where the domain is determined by the surface of a 
lake. 
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Figure 6: Spectral graph wavelets on lake Geneva domain, (spatial map (a), 
contour plot (c)); compared with truncated wavelets from graph corresponding 
to complete mesh (spatial map (b), contour plot (d)). Note that the graph 
wavelets adapt to the geometry of the domain. 

For this example the lake domain is given as a mask defined on a regular 
grid. We construct the corresponding weighted graph having vertices that are 
grid points inside the lake, and retaining only edges connecting neighboring 
grid points inside the lake. We set all edge weights to unity. The corresponding 
graph Laplacian is thus exactly the 5-point stencil (13) for approximating the 
continuous operator —V 2 on the interior of the domain; while at boundary points 
the graph Laplacian is modified by the deletion of edges leaving the domain. 
We show an example wavelet on Lake Geneva in Figure 6. Shoreline data was 
taken from the GSHHS database [53] and the lake mask was created on a 256 
x 153 pixel grid using an azimuthal equidistant projection, with a scale of 232 
meters/pixel. The wavelet displayed is from the coarsest wavelet scale, using 
the generating kernel described in 8.1 with parameters K = 100 and J = 5 
scales. 

For this type of domain derived by masking a regular grid, one may compare 
the wavelets with those obtained by simply truncating the wavelets derived 
from a large regular grid. As the wavelets have compact support, the true 
and truncated wavelets will coincide for wavelets located far from the irregular 
boundary. As can be seen in Figure 6, however, they are quite different for 
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wavelets located near the irregular boundary. This comparison gives direct 
evidence for the ability of the spectral graph wavelets to adapt gracefully and 
automatically to the arbitrarily shaped domain. 

We remark that the regular sampling of data within the domain may be 
unrealistic for problems where data are collected at irregularly placed sensor 
locations. The spectral graph wavelet transform could also be used in this case 
by constructing a graph with vertices at the sensor locations, however we have 
not considered such an example here. 

9 Conclusions and Future Work 

We have presented a framework for constructing wavelets on arbitrary weighted 
graphs. By analogy with classical wavelet operators in the Fourier domain, we 
have shown that scaling may be implemented in the spectral domain of the 
graph Laplacian. We have shown that the resulting spectral graph wavelets are 
localized in the small scale limit, and form a frame with easily calculable frame 
bounds. We have detailed an algorithm for computing the wavelets based on 
Chebyshev polynomial approximation that avoids the need for explicit diago- 
nalization of the graph Laplacian, and allows the application of the transform to 
large graphs. Finally we have shown examples of the wavelets on graphs arising 
from several different potential application domains. 

There are many possible directions for future research for improving or ex- 
tending the SGWT. One property of the transform presented here is that, unlike 
classical orthogonal wavelet transforms, we do not subsample the transform at 
coarser spatial scales. As a result the SGWT is overcomplete by a factor of 
J+l where J is the number of wavelet scales. Subsampling of the SGWT can 
be determined by selecting a mask of vertices at each scale corresponding to 
the centers of the wavelets to preserve. This is a more difficult problem on an 
arbitrary weighted graph than on a regular mesh, where one may exploit the 
regular geometry of the mesh to perform dyadic subsampling at each scale. An 
interesting question for future research would be to investigate an appropriate 
criterion for determining a good selection of wavelets to preserve after subsam- 
pling. As an example, one may consider preserving the frame bounds as much 
as possible under the constraint that the overall overcompleteness should not 
exceed a specified factor. 

A related question is to consider how the SGWT would interact with graph 
contraction. A weighted graph may be contracted by partitioning its vertices 
into disjoint sets; the resulting contracted graph has vertices equal to the number 
of partitions and edge weights determined by summing the weights of the edges 
connecting any two partitions. Repeatedly contracting a given weighted graph 
could define a multiscale representation of the weighted graph. Calculating a 
single scale of the spectral graph wavelet transform for each of these contracted 
graphs would then yield a multiscale wavelet analysis. This proposed scheme 
is inspired conceptually by the fast wavelet transform for classical orthogonal 
wavelets, based on recursive filtering and subsampling. The question of how 
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to automatically define the contraction at each scale on an arbitrary irregular 
graph is itself a difficult research problem. 

The spectral graph wavelets presented here are not directional. In particular 
when constructed on regular meshes they yield radially symmetric waveforms. 
This can be understood as in this case the graph Laplacian is the discretization 
of the isotropic continuous Laplacian. In the field of image processing, however, 
it has long been recognized that directionally selective filters are more efficient 
at representing image structure. This raises the interesting question of how, 
and when, graph wavelets can be constructed which have some directionality. 
Intuitively, this will require some notion of local directionality, i.e. some way of 
defining directions of all of the neighbors of a given vertex. As this would require 
the definition of additional structure beyond the raw connectivity information, it 
may not be appropriate for completely arbitrary graphs. For graphs which arise 
from sampling a known orientable manifold, such as the meshes with irregular 
boundary used in Figure 6, one may infer such local directionality from the 
original manifold. 

For some problems it may be useful to construct graphs that mix both local 
and non-local connectivity information. As a concrete example consider the 
cortical graph wavelets shown in Figure 5. As the vertices of the graph corre- 
spond to sets of MRI voxels grouped into ROI's, the wavelets are defined on the 
ROI's and thus cannot be used to analyze data defined on the scale of individual 
voxels. Analyzing voxel scale data with the SGWT would require constructing 
a graph with vertices corresponding to individual voxels. However, the nonlocal 
connectivity is defined only on the scale of the ROI's. One way of defining the 
connectivity for the finer graph would be as a sum A nonlocal + A loca \ where 
A 7 ^ r ^ ocal is the weight of the connection between the ROI containing vertex m 
and the ROI containing vertex n, and A l £ c % 1 indexes whether m and n are spatial 
neighbors. Under this scheme we consider A local as implementing a "default" 
local connectivity not arising from any particular measurement. Considering 
this raises interesting questions of how to balance the relative contributions of 
the local and nonlocal connectivities, as well as how the special structure of the 
hybrid connectivity matrix could be exploited for efficient computation. 

The particular form of the wavelet generating kernel g used in the examples 
illustrating this work was chosen in a somewhat ad- hoc manner. Aside from 
localization in the small-scale limit which required polynomial behaviour of g at 
the origin, we have avoided detailed analysis of how the choice of g affects the 
wavelets. In particular, we have not chosen g and the choice of spatial scales to 
optimize the resulting frame bounds. More detailed investigation is called for 
regarding optimizing the design of g for different applications. 

The fast Chebyshev polynomial approximation scheme we describe here 
could itself be useful independent of its application for computing the wavelet 
transform. One application could be for filtering of data on irregularly shaped 
domains, such as described in Figure 6. For example, smoothing data on such a 
domain by convolving with a Gaussian kernel is confounded by the problem that 
near the edges the kernel would extend off of the domain. As an alternative, one 
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could express the convolution as multiplication by a function in the Fourier do- 
main, approximate this function with a Chebyshev polynomial, and then apply 
the algorithm described in this paper. This could also be used for band-pass or 
high-pass filtering of data on irregular domains, by designing appropriate filters 
in the spectral domain induced by the graph Laplacian. 

The Chebyshev approximation scheme may also be useful for machine learn- 
ing problems on graphs. Some recent work has studied using the "diffusion 
kernel" K t = e~ tC for use with kernel-based machine learning algorithms [54]. 
The Chebyshev polynomial scheme provides a fast way to approximate this 
exponential that may be useful for large problems on unstructured yet sparse 
graphs. 
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